Assignment for Homework 3: Parts 5-6: characterizing co-variability between 2 fields.
5. Scatter plot, correlation and covariance, regression-explained variance
Based on your data fields (which you've seen pictures of), make subsets of your 2 variables x and y and make a scatter plot of these showing the strongest (positive or negative) correlation of one field with the other you can find. The subset might simply be all (x,t) values if your fields are very similar (olr, precip), or maybe the 240 time values at one longitude, or 144 longitudinal values in the time mean, or time series at different longitudes if some variability is offset in your two fields (like pressure and wind).
Now consider the covariance and correlation of the two subset arrays entering your scatterplot.
What is the correlation coefficient corresponding to this scatter plot? rho = corrcoef(x(:),y(:)) rho = correlate(x,y)
What are the standard deviations of your two data subsets? std(x(:))stdev(y)
What fraction of the variance of y can be 'explained' by linear regression on x (y = mx + b)? How does this relate to rho? How much y variance is explained? (variance: with units of y squared) What is m? Hint: these are simple questions: use the math formula, not a computer code (Hsieh section 1.4.2, Eq. 1.33).
What fraction of the variance of x can be 'explained' by linear regression on variable y? (x = nx + a)? How does this relate to rho? What is n? Hint: these are simple questions, use the math formula not computer code.
Now add uncorrelated (random) noise with variance 1 to one of your variables. This might be like observation error. noisey = y + random('Normal',0,1,size(y))
How did the variance of y change when this noise was added? var(y(:)) var(noisey(:))
How did the correlation change? rho = corrcoef(x(:),noisey(:)) rho = correlate(x, noisey)
How do these changes affect the regression of y on x? How much (y+noise) variance is explained by linear regression on x? What is the new value of m in the new (noisey = mx + b) regression?
Hint: all these could be answered without using the computer, but it may help to confirm with data
6. Lagged correlation, covariance, and cross-covariance: hey let's compute all vs. all
Rather than pick time series at one longitude to correlate with other time series, let's use matrix power to fill out a whole 2D (144x144 or 240x240) covariance matrix
or what the heck 3D arrays: the covariance matrix as a function of longitude and time lag (144 x 144 x NLAGS).
For these ENSO type anomalies, let's do NLAGS = 49 [-24 to 24 months]. Everything we might want to know about correlations among our variables is a slice or section of that big covariance array.
Let's work from the anomx and anomy arrays (built in part 1 by subtracting the climatological annual cycle at each longitude). This is traditional climate statistics, although notice that of course we could look at spatial instead of temporal covariances, define anomalies from spatial rather than temporal averages, etc. if we cared to.
Code for computing covariance arrays:
Same code lines for both Matlab and IDL: MAXLAG = 24; % out to +/- 24 months NLAGS = 2*MAXLAG + 1; % 49 lags are considered
% make a lag vector, for a plot axis later lags = linspace(-MAXLAG, MAXLAG, NLAGS); % lag vector, for plot axis
%% Matlab, upon reading our files, sees them as (144 x 240) arrays. %% The first dimension, 144, is called "rows" for whatever reason, %% while the second dimension, 240, is called "columns". %% Statistics functions assume the "columns" are the structural dimension while %% "rows" are the statistical dimension to be averaged over. %% So if we want the TEMPORAL covariance of our TEMPORAL anomalies, %% as a function of longitude, we need to transpose the second matrix rather than the first: %% covxx = anomx*transpose(anomx) /240.; %% OR we could just use the cov() or corrcoef() functions,
%% but we'd need to transpose: varx = cov( transpose(anomx)); %% Frustratingly, cov(x,y) and corrcoef(x,y) give stupid-sized outputs (2x2), %% although cov(x) and corrcoef(x) are sensible. who wants that?? Frigging matlab. %% Let's just do it more directly, manually, by matrix multiplication.
%%%% Declare the desired arrays: lagged temporal covariance, as function of space (144). lagcovxy_temporal = zeros(144,144, NLAGS); lagcovxx_temporal = zeros(144,144, NLAGS); lagcovyy_temporal = zeros(144,144, NLAGS);
% explicit loop over lag, to fill the arrays of lagged covariance. % Must trim off data within MAXLAG of the ends of the time series, so it's a bit fussy:
NTSTATS = 240. - NLAGS; % trimmed off ends, so sum is over 191 months, not 240 % for ilag = 1:NLAGS % lag index 1-49 for ilag = 1:NLAGS-1 % lag index 1-48 only!!! (bugs/ matlab sux)
% UMM BETTER JUST USE NLAGS-1 in the above because it chokes on 49
% Remember covariance is correlation coefficient times the standard
% deviations of the 2 variables. So make stdev 144 element arrays
% for converting covariances to correlations
stdx = squeeze(std(transpose(anomx))); % 144 element vector: std(anomx) at each longitude stdy = squeeze(std(transpose(anomy)));
stdxrep = repmat(stdx,144,1); % replicate matrix to 144 x 144 stdyrep = repmat(stdy,144,1);
figure(4) plot(stdx) title('OLR stdev as a function of lon')
% Make a few plots
figure(3) contourf( squeeze(lagcovxx_temporal(:,80,:)) ) % covariance as a function of lag for longitude #80 = 197.5E
title('COVARIANCE of OLR (lon=197.5) vs lon and lag in [-24,24] months')
; covxx = matrix_multiply (anomx, transpose(anomx));THIS ONE IS 144x144
; %%% explicit loop to fill the array. Have to trim data within MAXLAG
; of the end of the time series.
NTSTATS = 240-NLAGS
.run for ilag = 0, NLAGS-1 do begin timelag = ilag-MAXLAG lagcovxx_temporal(*,*, ilag) = matrix_multiply( anomx(*, MAXLAG :239-MAXLAG), $ transpose( anomy(*, MAXLAG+timelag:239-MAXLAG+timelag) ))/NTSTATS; endfor
end
6. Lagged correlation, covariance, and cross-covariance: questions
Show the zero-lag spatial covariance and correlation structures for your primary field, like OLR_anoms_covar_correl.BEM.Matlab.png this for OLR. (please label the axes better than I did!) Interpret the results.
Show longitude-lag sections of the covariance or correlation of this field, for a base point at some longitude of interest. Like this for OLR at a central Pacific longitude: OLR.lagregression.BEM.jpg (Please label the axes better than I did in this example! I hate Matlab). Better in IDL: olr_lag_covariances.gif
Intepret the results in terms of the characteristic space and time scales of your anomalies. Can you see these characteristic scales in your original raw data like in [[file/view/olr_lag_covariances.gif|olr_lag_covariances.gif]]?
Share a longitude-lag slice of your lagged co-variance matrix for your TWO fields. Label it, interpret it.
Assignment for Homework 3: Parts 5-6: characterizing co-variability between 2 fields.
5. Scatter plot, correlation and covariance, regression-explained variance
6. Lagged correlation, covariance, and cross-covariance: hey let's compute all vs. all
Rather than pick time series at one longitude to correlate with other time series, let's use matrix power to fill out a whole
2D (144x144 or 240x240) covariance matrix
or what the heck
3D arrays: the covariance matrix as a function of longitude and time lag (144 x 144 x NLAGS).
For these ENSO type anomalies, let's do NLAGS = 49 [-24 to 24 months]. Everything we might want to know about correlations among our variables is a slice or section of that big covariance array.
Let's work from the anomx and anomy arrays (built in part 1 by subtracting the climatological annual cycle at each longitude). This is traditional climate statistics, although notice that of course we could look at spatial instead of temporal covariances, define anomalies from spatial rather than temporal averages, etc. if we cared to.
Code for computing covariance arrays:
Same code lines for both Matlab and IDL:
MAXLAG = 24; % out to +/- 24 months
NLAGS = 2*MAXLAG + 1; % 49 lags are considered
%Matlab:
% deseasonalized anomalies
anomy = reshape(anomy,144,240);
anomx = reshape(anomx,144,240);
% make a lag vector, for a plot axis later
lags = linspace(-MAXLAG, MAXLAG, NLAGS); % lag vector, for plot axis
%% Matlab, upon reading our files, sees them as (144 x 240) arrays.
%% The first dimension, 144, is called "rows" for whatever reason,
%% while the second dimension, 240, is called "columns".
%% Statistics functions assume the "columns" are the structural dimension while
%% "rows" are the statistical dimension to be averaged over.
%% So if we want the TEMPORAL covariance of our TEMPORAL anomalies,
%% as a function of longitude, we need to transpose the second matrix rather than the first:
%% covxx = anomx*transpose(anomx) /240.;
%% OR we could just use the cov() or corrcoef() functions,
%% but we'd need to transpose: varx = cov( transpose(anomx));
%% Frustratingly, cov(x,y) and corrcoef(x,y) give stupid-sized outputs (2x2),
%% although cov(x) and corrcoef(x) are sensible. who wants that?? Frigging matlab.
%% Let's just do it more directly, manually, by matrix multiplication.
%%%% Declare the desired arrays: lagged temporal covariance, as function of space (144).
lagcovxy_temporal = zeros(144,144, NLAGS);
lagcovxx_temporal = zeros(144,144, NLAGS);
lagcovyy_temporal = zeros(144,144, NLAGS);
% explicit loop over lag, to fill the arrays of lagged covariance.
% Must trim off data within MAXLAG of the ends of the time series, so it's a bit fussy:
NTSTATS = 240. - NLAGS; % trimmed off ends, so sum is over 191 months, not 240
% for ilag = 1:NLAGS % lag index 1-49
for ilag = 1:NLAGS-1 % lag index 1-48 only!!! (bugs/ matlab sux)
% UMM BETTER JUST USE NLAGS-1 in the above because it chokes on 49
timelag = ilag-MAXLAG; % actual time lag, -24 to 24 monthslagcovxy_temporal(:,:, ilag) = anomx(:, MAXLAG:end-MAXLAG) * transpose( anomy(:, MAXLAG+timelag:end-MAXLAG+timelag) )/NTSTATS;
lagcovxx_temporal(:,:, ilag) = anomx(:, MAXLAG:end-MAXLAG) * transpose( anomx(:, MAXLAG+timelag:end-MAXLAG+timelag) )/NTSTATS;
lagcovyy_temporal(:,:, ilag) = anomy(:, MAXLAG:end-MAXLAG) * transpose( anomy(:, MAXLAG+timelag:end-MAXLAG+timelag) )/NTSTATS;
end
% Remember covariance is correlation coefficient times the standard
% deviations of the 2 variables. So make stdev 144 element arrays
% for converting covariances to correlations
stdx = squeeze(std(transpose(anomx))); % 144 element vector: std(anomx) at each longitude
stdy = squeeze(std(transpose(anomy)));
stdxrep = repmat(stdx,144,1); % replicate matrix to 144 x 144
stdyrep = repmat(stdy,144,1);
stdyy = (stdyrep .* transpose(stdyrep)); % elementwise multiplication gives stdev x stdev
stdxx = (stdxrep .* transpose(stdxrep)); % elementwise multiplication gives stdev x stdev
stdxy = (stdxrep .* transpose(stdyrep)); % elementwise multiplication gives stdev x stdev
% Make a few plots
figure(1)
contourf(lagcovxx_temporal(:,:,MAXLAG)) % zero-lag covariance
title('zero-lag COVARIANCE of anomx = deseasonalized olr anomalies')
figure(2)
contourf(lagcovxx_temporal(:,:,MAXLAG)./stdxx) % zero-lag covariance
title('zero-lag CORRELATION of anomx = deseasonalized olr anomalies')
NOTE ADDED AT END OF COURSE: Some error here as std is computed from144 values while cov is from trimmed values to leave room for lag.
see some corrections at http://mpo581-hw3-morestats.wikispaces.com/Results_sst_slp_Chen
figure(4)
plot(stdx)
title('OLR stdev as a function of lon')
% Make a few plots
figure(3)
contourf( squeeze(lagcovxx_temporal(:,80,:)) ) % covariance as a function of lag for longitude #80 = 197.5E
title('COVARIANCE of OLR (lon=197.5) vs lon and lag in [-24,24] months')
full code HW3_olr_precip_mapes.m
IDL:
MAXLAG = 24; % out to +/- 24 months
NLAGS = 2*MAXLAG + 1; % 49 lags are considere
lag = -MAXLAG + indgen(NLAGS)
anomx = reform(anomx,144,240)
anomy = reform(anomy,144,240)
lagcovxx_temporal = fltarr(144,144, NLAGS);
lagcovxy_temporal = fltarr(144,144, NLAGS);
lagcovyy_temporal = fltarr(144,144, NLAGS);
; covxx = matrix_multiply (anomx, transpose(anomx));THIS ONE IS 144x144
; %%% explicit loop to fill the array. Have to trim data within MAXLAG
; of the end of the time series.
NTSTATS = 240-NLAGS
.run
for ilag = 0, NLAGS-1 do begin
timelag = ilag-MAXLAG
lagcovxx_temporal(*,*, ilag) = matrix_multiply( anomx(*, MAXLAG :239-MAXLAG), $
transpose( anomy(*, MAXLAG+timelag:239-MAXLAG+timelag) ))/NTSTATS;
endfor
end
;;; Some plots
contour, anomx <19.9 >(-19.9), lon, time, /fill, levels = levels, tit='olr anomalies', xtit='lon'
contour, REFORM(lagcovxx_temporal(*,70,*)), lon, lag, /fill, nlev=23, xtit='lon', ytit='lag', levels=levels
;;; full code HW3_olr_precip_mapes.pro
6. Lagged correlation, covariance, and cross-covariance: questions